Azure
Synapse SQL Pool Query Hint
Query hints are a great way to enhance query performance and
force the optimizer to perform certain operations that an engineer may deem to be
more optimal. The example below will show you how a query that can potentially
take over an hour to run will run around 15 minutes by leveraging a query hint.
Enforcing the importance of query hints in a Data Engineers repertoire.
Test Case:
In this test case, we have the below query that takes over
an hour to execute on a DW100c instance. We are leveraging a large resource
class to execute the query as well. Result set cache is disabled from the
session level to ensure cache data is not being used.
------------------------------------------Start----------------------------------------------
SET RESULT_SET_CACHING OFF
SELECT
Coalesce(p.[ModelName], p.[EnglishProductName])
AS [Model]
,g.City AS ResellerCity
,g.StateProvinceName
AS StateProvince
,Year(f.OrderDate) AS CalendarYear
,CASE
WHEN Month(f.OrderDate)
< 7 THEN Year(f.OrderDate)
ELSE Year(f.OrderDate)
+ 1
END
AS FiscalYear
-- Fiscal year
runs from Jul to June)
,Month(f.OrderDate) AS [Month]
,Sum(f.OrderQuantity) AS Quantity
,Sum(f.ExtendedAmount) AS Amount
,Approx_count_distinct(f.SalesOrderNumber) AS UniqueOrders
FROM
[dbo].[FactResellerSales_HASH_CCI] f
INNER JOIN [dbo].[DimReseller]
r
ON f.ResellerKey = r.ResellerKey
INNER JOIN [dbo].[DimGeography]
g
ON r.GeographyKey = g.GeographyKey
INNER JOIN [dbo].[DimProduct]
p
ON
f.[ProductKey] = p.[ProductKey]
GROUP BY
Coalesce(p.[ModelName], p.[EnglishProductName])
,g.City
,g.StateProvinceName
,Year(f.OrderDate)
,CASE
WHEN Month(f.OrderDate)
< 7 THEN Year(f.OrderDate)
ELSE Year(f.OrderDate)
+ 1
END
,Month(f.OrderDate)
ORDER BY Amount DESC
------------------------------------------End----------------------------------------------
We first purposely neglect to create any user defined statistics
and disabled the auto creation of system statistics on the database to also
reinforce the importance of statistics on the data warehouse system. Within 25 minutes
of executing, we received a “Msg 1105, Level 16, State 2, Line 31” indicating
that TempDB could not allocate any additional space.
We proceed to create statistics with a sample size of 50% to
confirm if we can at least get the query to run and not fail.
------------------------------------------Start----------------------------------------------
CREATE STATISTICS [FactResellerSales_HASH_CCI_ord_nq]
ON
[dbo].[FactResellerSales_HASH_CCI]([OrderDate],OrderQuantity,ExtendedAmount ) WITH
SAMPLE 50 PERCENT;
CREATE STATISTICS [FactResellerSales_HASH_CCI_prodk]
ON [dbo].[FactResellerSales_HASH_CCI]([ProductKey]
) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [FactResellerSales_HASH_CCI_resek]
ON [dbo].[FactResellerSales_HASH_CCI]([ResellerKey]
) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [DimReseller_resek_stat]
ON [dbo].[DimReseller]([ResellerKey] ) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [DimReseller_geogr_stat]
ON [dbo].[DimReseller]([GeographyKey] ) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [DimGeography_geogr_stat]
ON [dbo].[DimGeography](GeographyKey ) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [DimGeography_city_stat]
ON [dbo].[DimGeography](City ) WITH
SAMPLE 50 PERCENT;
CREATE STATISTICS [DimGeography_engprod_stat]
ON [dbo].[DimGeography](EnglishProductName
) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [DimProduct_prodkey_stat]
ON [dbo].[DimProduct]([ProductKey] ) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [DimProduct_prodname_stat]
ON [dbo].[DimProduct]([EnglishProductName]
) WITH SAMPLE 50 PERCENT;
CREATE STATISTICS [DimProduct_ModelName_stat]
ON [dbo].[DimProduct]([ModelName] ) WITH SAMPLE 50 PERCENT;
------------------------------------------End----------------------------------------------
While
the query is able to run, we are unable to get it to
run within an hour before our ETL loading job need to remove sessions and run.
As
you can see below, there are three expensive sort operations which are taking
predominately most of the cost of the query.
We proceed to adjust the query and add the HASH GROUP hint
to force the optimizer to perform a hash aggregate with the grouping data. We
now see instead of three sort operations, the data is
only sorted at the end and only once. In addition, the query only takes 15
minutes to complete. Which is a huge performance improvement for our runtimes,
which fits into the ETL window.
SELECT
Coalesce(p.[ModelName], p.[EnglishProductName])
AS [Model]
,g.City AS ResellerCity
,g.StateProvinceName
AS StateProvince
,Year(f.OrderDate) AS CalendarYear
,CASE
WHEN Month(f.OrderDate)
< 7 THEN Year(f.OrderDate)
ELSE Year(f.OrderDate)
+ 1
END
AS FiscalYear
-- Fiscal year
runs from Jul to June)
,Month(f.OrderDate) AS [Month]
,Sum(f.OrderQuantity) AS Quantity
,Sum(f.ExtendedAmount) AS Amount
,Approx_count_distinct(f.SalesOrderNumber) AS UniqueOrders
FROM
[dbo].[FactResellerSales_HASH_CCI] f
INNER JOIN [dbo].[DimReseller]
r
ON f.ResellerKey = r.ResellerKey
INNER JOIN [dbo].[DimGeography]
g
ON r.GeographyKey = g.GeographyKey
INNER JOIN [dbo].[DimProduct]
p
ON f.[ProductKey] = p.[ProductKey]
GROUP BY
Coalesce(p.[ModelName], p.[EnglishProductName])
,g.City
,g.StateProvinceName
,Year(f.OrderDate)
,CASE
WHEN Month(f.OrderDate)
< 7 THEN Year(f.OrderDate)
ELSE Year(f.OrderDate)
+ 1
END
,Month(f.OrderDate)
ORDER BY Amount DESC
OPTION(HASH GROUP)
Conclusion:
As we have seen above, query hints play an integral role in truly
enhancing workloads. We went from over an hour runtime, to around 15 minutes.
It’s important for Data Engineers to exhaust all optimization options to ensure
a performant running query in a data warehouse.
Rule of thumb, for group by and order by queries, confirm if
OPTION(HASH GROUP) would allow for further
optimization. For statements with several join operations, using smaller
dimension tables as the leading table in the join, confirm if OPTION(FORCE ORDER) would enhance the query run times. For a
list of additional query hints, please visit: OPTION Clause (Transact-SQL) - SQL
Server | Microsoft Docs
DISCLAIMER: Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment unless thorough testing has been conducted by the app and database teams. THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. We grant You a nonexclusive, royalty-free right to use and modify the Sample Code and to reproduce and distribute the object code form of the Sample Code, provided that. You agree: (i) to not use Our name, logo, or trademarks to market Your software product in which the Sample Code is embedded; (ii) to include a valid copyright notice on Your software product in which the Sample Code is embedded; and (iii) to indemnify, hold harmless, and defend Us and Our suppliers from and against any claims or lawsuits, including attorneys’ fees, that arise or result from the use or distribution or use of the Sample Code.